Assessing the visual speech perception of sampled-based talking heads
نویسندگان
چکیده
Focusing on flexible applications for limited computing devices, this paper investigates the improvement on the visual speech perception obtained by the implicitly modeling of coarticulation on a sample-based talking head that is characterized by a compact image database and a morphing visemes synthesis strategy. Speech intelligibility tests were applied to assess the effectiveness of the proposed context-dependent visemes (CDV) model, comparing it to a simpler model that does not handle coarticulation. The results show that, when compared to the simpler model, the CDV approach improves speech intelligibility in situations in which the audio is degraded by noise. Moreover the CDV model achieves 80% to 90% of visual speech intelligibility of video of a real talker in the tested cases. Additionally, when the audio is heavily degraded by noise, the results suggest that the mechanisms that explain visual speech perception depends on the quality of the audible information.
منابع مشابه
Comparison of visual speech perception of sampled-based talking heads: adults and children with and without developmental dyslexia
Among other applications, videorealistic talking heads are envisioned as a programmable tool to train skills which involve the observation of human face. This work presents partial results of a study conducted with adults and children with and without developmental dyslexia to evaluate the level of speech intelligibility provided by a sample-based talking head model in comparison with unimodal ...
متن کاملAudio-Visual Prosody: Perception, Detection, and Synthesis of Prominence
In this chapter, we investigate the effects of facial prominence cues, in terms of gestures, when synthesized on animated talking heads. In the first study a speech intelligibility experiment is conducted, where speech quality is acoustically degraded, then the speech is presented to 12 subjects through a lip synchronized talking head carrying head-nods and eyebrow raising gestures. The experim...
متن کاملSegmental optical phonetics for human and machine speech processing
That talkers produce optical as well as acoustic speech signals, and that perceivers process both types of signals has become well known. Although perceptual effects due to audiovisual speech integration have been a focus of research involving the visual speech stimulus, relatively little is known about visual-only speech perception and optical phonetic signals. This knowledge is needed to expl...
متن کاملAudio-visual quality as combination of unimodal qualities: environmental effects on talking heads
Introduction Talking heads provide a multimodal output component for human-computer-interfaces. They consist of facial visual models that are synchronized with speech synthesis modules concerning speech articulation. Due to their reduction to a human head or upper body, articulation is often more clearly visible compared to a full human body due to the possibly bigger display of the head. There...
متن کاملCloning synthetic talking heads
The quality of Text-to-Visual-Speech synthesis is judged by how well it matches the visual perception of speech articulators with acoustic speech perception. Concurrently, di erent viewers often prefer di erent head models for subjective reasons. Traditional facial animation approach tied the parameterization of animation directly to the model. Switching the head model is di cult because a leng...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2013